De Novo Genome Assembly    ◾    93

3.2  EXAMPLES OF DE NOVO ASSEMBLERS

The above is a brief definition for the three kinds of the algorithms for the de novo genome

assembly. For the algorithms themselves, you may need to refer to an algorithm book.

In the following, we will discuss some example assemblers to show you how the de novo

genome assembly is performed. There are many papers that reported the comparative per-

formance of these assemblers and others. Readers can refer to those papers to see the dif-

ferences found by researchers.

3.2.1  ABySS

ABySS (Assembly By Short Sequences) [8] is a parallel de novo genome assembler devel-

oped to assemble very large data of short reads produced by NGS technologies. It per-

forms assembly in two stages. First, it generates all possible k-mers from the reads, removes

potential errors, and builds contigs using de Bruijn graphs. Second, it uses mate-pair infor-

mation to extend contigs benefiting from contig overlaps and merges the unambiguously

connected graph nodes. ABySS can be used in two modes: bloom filter mode, which uses

hashing, and MPI mode, which uses message passing interface (MPI) to parallelize the de

novo assembly. It is recommended to use the bloom filter mode over the legacy MPI because

it reduces the memory usage to 10 folds. ABySS can be installed following the instructions

available at “https://github.com/bcgsc/abyss”. On Ubuntu, we can install it using “sudo

apt-get install abyss”. Once ABySS has been installed, the “abyss-pe” command can be

FIGURE 3.5  De Bruijn graphs.